Improvements to Korektor: A Case Study with Native and Non-Native Czech

نویسندگان

  • Loganathan Ramasamy
  • Alexandr Rosen
  • Pavel Stranák
چکیده

We present recent developments of Korektor, a statistical spell checking system. In addition to lexicon, Korektor uses language models to find real-word errors, detectable only in context. The models and error probabilities, learned from error corpora, are also used to suggest the most likely corrections. Korektor was originally trained on a small error corpus and used language models extracted from an in-house corpus WebColl. We show two recent improvements: • We built new language models from freely available (shuffled) versions of the Czech National Corpus and show that these perform consistently better on texts produced both by native speakers and nonnative learners of Czech. • We trained new error models on a manually annotated learner corpus and show that they perform better than the standard error model (in error detection) not only for the learners’ texts, but also for our standard evaluation data of native Czech. For error correction, the standard error model outperformed non-native models in 2 out of 3 test datasets. We discuss reasons for this not-quite-intuitive improvement. Based on these findings and on an analysis of errors in both native and learners’ Czech, we propose directions for further improvements of Korektor.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparing Lexical Bundles in Hard Science Lectures; A Case of Native and Non-Native University Lecturers

Researchers stated that learning and applying certain set of lexical bundles of native lecturers by non-native lecturers would help students improve their proficiency through incidental vocabulary input. The present study shed light on the lexical bundles in hard science lectures used by Native and Non-native lecturers in international universities with the main purpose of analyzing the structu...

متن کامل

The Use of Lexical Bundles in Native and Non-native Post-graduate Writing: The Case of Applied Linguistics MA Theses

Connor et al. (2008) mention “specifying textual requirements of genres” (p.12) as one of the reasons which have motivated researchers in the analysis of writing. Members of each genre should be able to produce and retrieve these textual requirements appropriately to be considered communicatively proficient. One of the textual requirements of genres is regularities of specific forms and content...

متن کامل

Language Learning and Language Teaching:Episodes of the Lives of Six EFL Teachers in Iran

Teachers are the most important players of every educational system in different societies; accordingly, understanding their personal reflections may help us gain valuable insights into what it means to be a teacher in a specific cultural and social context. The purpose of this case study was to investigate the life and career of 6 non-native English speaking teachers in state educational syste...

متن کامل

The Discursive Construction of “Native” and “Non-Native” ‎Speaker English Teacher Identities in Japan: A Linguistic ‎Ethnographic Investigation

Recent poststructuralist theories of identity posit identities as being discursively constructed in interactions with society, institutions, and individuals. This study used a Linguistic Ethnographic framework to investigate the discursive identity construction of two English teachers, one ‘non-native’ English speaker, and one ‘native’ English speaker, teaching English in a tertiary institution...

متن کامل

Korektor - A System for Contextual Spell-Checking and Diacritics Completion

We present Korektor – a flexible and powerful purely statistical text correction tool for Czech that goes beyond a traditional spell checker. We use a combination of several language models and an error model to offer the best ordering of correction proposals and also to find errors that cannot be detected by simple spell checkers, namely spelling errors that happen to be homographs of existing...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015